Skip to content

fix(drive): write files.download --output contents#805

Open
Lubrsy706 wants to merge 7 commits into
googleworkspace:mainfrom
Lubrsy706:fix/drive-download-output
Open

fix(drive): write files.download --output contents#805
Lubrsy706 wants to merge 7 commits into
googleworkspace:mainfrom
Lubrsy706:fix/drive-download-output

Conversation

@Lubrsy706
Copy link
Copy Markdown

Fixes #789.

Summary

  • When drive.files.download returns an operation JSON with a downloadUri and the caller provided --output, follow the Google download URI and stream the actual bytes to the requested output file.
  • Preserve existing JSON output behavior for other methods and for files.download calls without --output.
  • Reject non-Google downloadUri values before following them.

Testing

  • cargo fmt --check
  • cargo test -p google-workspace-cli test_extract_
  • cargo test -p google-workspace-cli executor::tests

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 14, 2026

⚠️ No Changeset found

Latest commit: 4fb470a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the CLI's file download capabilities by enabling automatic resolution of Google Drive download URIs. When a user specifies an output file for a 'drive.files.download' operation, the CLI now intelligently parses the API response to locate the download URL, validates its origin, and streams the binary data to the destination, while maintaining existing behavior for other API calls.

Highlights

  • Download URI Resolution: Implemented logic to automatically extract and follow 'downloadUri' or 'downloadUrl' fields from API responses when the 'drive.files.download' method is called with an '--output' flag.
  • Security Validation: Added a validation layer to ensure that the extracted download URI belongs to authorized Google domains before attempting to follow it.
  • Streaming Support: Integrated binary response handling to stream file contents directly to the specified output path upon successful resolution of the download URI.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Generative AI Prohibited Use Policy, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements logic to handle file downloads for Google Drive by extracting download URIs from API responses and executing the subsequent download request. The changes include URI validation to ensure only Google-hosted links are followed. Feedback indicates that the new download request should include the "x-goog-user-project" header for consistent quota and billing attribution, similar to other API calls in the executor.

Comment on lines +544 to +549
let mut download_request = client.get(download_uri);
if let Some(token) = token {
if auth_method == AuthMethod::OAuth {
download_request = download_request.bearer_auth(token);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The download request for the downloadUri is missing the x-goog-user-project header, which is required by many Google APIs for quota and billing attribution when using a service account or specific project-based credentials. For consistency with the primary API request built in build_http_request, this header should be included if a quota project is configured.

                    let mut download_request = client.get(download_uri);
                    if let Some(quota_project) = crate::auth::get_quota_project() {
                        download_request = download_request.header("x-goog-user-project", quota_project);
                    }
                    if let Some(token) = token {
                        if auth_method == AuthMethod::OAuth {
                            download_request = download_request.bearer_auth(token);
                        }
                    }

@googleworkspace-bot
Copy link
Copy Markdown
Collaborator

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces specialized handling for Google Drive file downloads by extracting download URIs from API responses and performing secondary requests to retrieve the file content. It includes security checks to validate that download URIs belong to Google domains and refactors quota project header logic into a helper function. Review feedback suggests ensuring the secondary download request inherits all necessary headers from the client and improving error context when reading failed download response bodies.


if output_path.is_some() && method.id.as_deref() == Some("drive.files.download") {
if let Some(download_uri) = extract_google_download_uri(&body_text)? {
let mut download_request = add_quota_project_header(client.get(download_uri));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The download request is built using only the download_uri and the quota project header. It does not inherit other potentially important headers from the original request, such as User-Agent or custom headers that might be required by the storage backend. Ensure that the client used (built at line 481) has the necessary default headers configured.

.to_string();

if !download_status.is_success() {
let error_body = download_response.text().await.unwrap_or_default();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using unwrap_or_default() when reading the error body might mask the actual error if the body cannot be read. While handle_error_response handles empty strings, it would be better to provide context if the body read fails, similar to how it's done for the main response body at line 537.

@googleworkspace-bot
Copy link
Copy Markdown
Collaborator

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements support for Google Drive file downloads by extracting download URIs from API responses and performing secondary requests. It includes domain validation for security, refactors quota header logic into a helper function, and adds comprehensive unit tests. A critical feedback point was raised regarding the potential for 400 Bad Request errors when applying Authorization headers to Signed URLs, which already contain authentication parameters.

Comment on lines +640 to +644
if let Some(token) = token {
if *auth_method == AuthMethod::OAuth {
request = request.bearer_auth(token);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Adding an Authorization header to a Google download URI can cause a 400 Bad Request if the URI is a "Signed URL" (common for large file downloads from Google Cloud Storage). Signed URLs already contain authentication in their query parameters (e.g., GoogleAccessId or X-Goog-Signature), and providing a Bearer token simultaneously creates an authentication conflict. You should only add the bearer token if the URL does not appear to be signed.

    if let Some(token) = token {
        if *auth_method == AuthMethod::OAuth {
            // Only add Authorization header if the URL doesn't appear to be a GCS Signed URL.
            // Signed URLs contain authentication in query parameters, and adding a Bearer
            // token causes a 400 Bad Request conflict.
            let is_signed_url = reqwest::Url::parse(download_uri)
                .map(|u| {
                    u.query_pairs().any(|(k, _)| {
                        k == "GoogleAccessId" || k == "Signature" || k.starts_with("X-Goog-")
                    })
                })
                .unwrap_or(false);

            if !is_signed_url {
                request = request.bearer_auth(token);
            }
        }
    }

@googleworkspace-bot
Copy link
Copy Markdown
Collaborator

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logic to handle secondary downloads for Google Drive by extracting download URIs from API responses and executing follow-up requests. It includes helper functions for URI validation and header management, along with corresponding unit tests. A security concern was raised regarding the potential leakage of Bearer tokens if the CLI misidentifies user-controlled JSON as an API response; it is recommended to verify the response type (e.g., checking for a 'kind' or 'done' field) before following URIs. Additionally, the method ID 'drive.files.download' should be verified against the Discovery Document to ensure the logic triggers as intended.

Comment on lines +542 to +543
if output_path.is_some() && method.id.as_deref() == Some("drive.files.download") {
if let Some(download_uri) = extract_google_download_uri(&body_text)? {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

There are two significant concerns with this logic:

  1. Method ID Accuracy: Please verify that drive.files.download is the correct method ID. In the standard Google Drive API v3 Discovery Document, the method for downloading file content is drive.files.get (typically used with alt=media). If the ID is actually drive.files.get, this block will never execute.
  2. Security/Correctness Risk: Following a downloadUri from a JSON body is dangerous if the response is actually the file content (e.g., a user-controlled JSON file) rather than an API Operation or metadata. If a user's file contains a downloadUri field pointing to a malicious but Google-hosted domain (like a subdomain of googleusercontent.com), the CLI might follow it and leak the user's Bearer token.

To fix this, you should verify that the JSON response is indeed an API response (e.g., by checking for a "kind": "drive#operation" field or the presence of the "done" field characteristic of Google Operations) before attempting to extract and follow a download URI.

@googleworkspace-bot
Copy link
Copy Markdown
Collaborator

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements automatic following of download URIs for Google Drive file downloads. It adds several helper functions to extract and validate download URIs, ensuring they belong to trusted Google domains before proceeding. The implementation also refactors quota project header handling and includes comprehensive unit tests. Review feedback suggests simplifying the host validation logic in is_google_download_uri to remove redundant checks and explicitly include the base domain.

Comment on lines +416 to +418
host == "storage.googleapis.com"
|| host.ends_with(".googleapis.com")
|| host.ends_with(".googleusercontent.com")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The check host == "storage.googleapis.com" is redundant because host.ends_with(".googleapis.com") already covers it. To ensure the host is either exactly the base domain or a proper subdomain, the check should include the base domain explicitly. Suggestions to add additional domains like googleusercontent.com are omitted to avoid scope creep.

Suggested change
host == "storage.googleapis.com"
|| host.ends_with(".googleapis.com")
|| host.ends_with(".googleusercontent.com")
host == "googleapis.com"
|| host.ends_with(".googleapis.com")
References
  1. Avoid introducing changes that are outside the primary goal of a pull request to prevent scope creep.

@googleworkspace-bot
Copy link
Copy Markdown
Collaborator

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logic to handle Google Drive downloads by detecting download URIs in API responses and performing secondary requests. It includes refactoring for quota headers and comprehensive unit tests. The review feedback highlights critical security and functional concerns, specifically regarding overly permissive host validation that could lead to token exfiltration, the improper inclusion of authorization and quota headers in requests to GCS or signed URLs, and the need for more specific detection of Drive operations to prevent false positives and SSRF risks.

Comment on lines +405 to +417
fn is_google_download_uri(uri: &str) -> bool {
let Ok(url) = reqwest::Url::parse(uri) else {
return false;
};
if url.scheme() != "https" || !url.username().is_empty() || url.password().is_some() {
return false;
}
let Some(host) = url.host_str() else {
return false;
};

host == "googleapis.com" || host.ends_with(".googleapis.com")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The host validation logic is overly permissive and susceptible to token exfiltration. By allowing any host ending in .googleapis.com, the CLI will trust attacker-controlled subdomains such as attacker-bucket.storage.googleapis.com. An attacker could share a malicious JSON file that mimics a Drive operation, causing the CLI to send the user's OAuth Bearer token to the attacker's bucket during a download.

Restrict the allowed hosts to a strict allowlist of trusted Google API domains.

fn is_google_download_uri(uri: &str) -> bool {
    let Ok(url) = reqwest::Url::parse(uri) else {
        return false;
    };
    if url.scheme() != "https" || !url.username().is_empty() || url.password().is_some() {
        return false;
    }
    let Some(host) = url.host_str() else {
        return false;
    };

    // Restrict to known trusted Google API hosts to prevent token exfiltration 
    // to attacker-controlled subdomains (e.g., GCS virtual-hosted buckets).
    matches!(host, "googleapis.com" | "www.googleapis.com" | "storage.googleapis.com")
}

Comment on lines +653 to +668
fn build_download_request(
client: &reqwest::Client,
download_uri: &str,
token: Option<&str>,
auth_method: &AuthMethod,
) -> reqwest::RequestBuilder {
// Keep secondary Drive downloads on the same reqwest client so they use
// the same client-level configuration as API requests.
let mut request = add_quota_project_header(client.get(download_uri));
if let Some(token) = token {
if *auth_method == AuthMethod::OAuth && !is_signed_download_uri(download_uri) {
request = request.bearer_auth(token);
}
}
request
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

This implementation has two significant issues:

  1. Signed URL Compatibility: Adding the x-goog-user-project header to a signed URL (like those from GCS) will cause a SignatureDoesNotMatch (403) error because the header was not part of the original signature.
  2. Security (Token Leakage): Sending Bearer tokens to storage.googleapis.com is dangerous as it allows token exfiltration to user-controlled buckets. Bearer tokens should only be sent to the main API surface (www.googleapis.com). GCS downloads should rely on signed URLs which do not require the Authorization header.
fn build_download_request(
    client: &reqwest::Client,
    download_uri: &str,
    token: Option<&str>,
    auth_method: &AuthMethod,
) -> reqwest::RequestBuilder {
    let mut request = client.get(download_uri);
    let is_signed = is_signed_download_uri(download_uri);

    // Only add auth/quota headers if the URI is NOT a signed URL.
    if !is_signed {
        request = add_quota_project_header(request);
        if let Some(token) = token {
            // Only send Bearer tokens to the main API surface. 
            // GCS downloads (storage.googleapis.com) should use signed URLs.
            let is_api_host = reqwest::Url::parse(download_uri)
                .map(|u| matches!(u.host_str(), Some("www.googleapis.com" | "googleapis.com")))
                .unwrap_or(false);

            if *auth_method == AuthMethod::OAuth && is_api_host {
                request = request.bearer_auth(token);
            }
        }
    }
    request
}

Comment on lines +397 to +403
fn is_drive_download_operation(json_val: &Value) -> bool {
json_val.get("done").is_some()
|| json_val
.get("kind")
.and_then(Value::as_str)
.is_some_and(|kind| kind == "drive#operation")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Checking for the presence of a done field is too generic and can lead to false positives when downloading user-provided JSON files. This increases the risk of the CLI misinterpreting file content as an API operation. Rely on the more specific kind field or other unique markers of a Drive operation response.

fn is_drive_download_operation(json_val: &Value) -> bool {
    json_val
        .get("kind")
        .and_then(Value::as_str)
        .is_some_and(|kind| kind == "drive#operation")
}

.context("Failed to read response body")?;

if output_path.is_some() && method.id.as_deref() == Some("drive.files.download") {
if let Some(download_uri) = extract_google_download_uri(&body_text)? {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

There is a security risk here: if a user downloads a malicious JSON file that mimics a Drive operation response, the CLI will follow the downloadUri contained within that file. While the host check and token restrictions (suggested elsewhere) mitigate token theft, this still constitutes a Server-Side Request Forgery (SSRF) where the CLI makes unexpected requests based on untrusted content. Consider adding a check to ensure the response is an actual API response (e.g., by verifying the presence of Google-specific API headers like X-Goog-Api-Client) before attempting to extract a download URI.

@googleworkspace-bot
Copy link
Copy Markdown
Collaborator

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements support for following redirect URIs in Google Drive download operations. It introduces logic to extract download URIs from API responses, validate them against allowed Google domains, and perform the subsequent download request with appropriate authentication and quota headers. Review feedback suggests expanding the domain whitelist to include googleusercontent.com subdomains and allowing authentication tokens to be sent to storage.googleapis.com to correctly handle redirects to Google Cloud Storage.

Comment on lines +414 to +419
fn is_google_download_uri(uri: &str) -> bool {
matches!(
parse_download_uri_host(uri).as_deref(),
Some("googleapis.com" | "www.googleapis.com" | "storage.googleapis.com")
)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The whitelist for following download URIs is too restrictive and will likely break legitimate Google Drive downloads. Google frequently uses various subdomains for content delivery, such as drive.googleusercontent.com or dynamically generated subdomains like doc-0k-8s-docs.googleusercontent.com.

It is safer to allow any subdomain of googleapis.com and googleusercontent.com for the purpose of following the redirect, as long as the authentication token is only sent to a more strictly validated set of hosts (which is handled separately in is_google_api_download_host).

fn is_google_download_uri(uri: &str) -> bool {
    parse_download_uri_host(uri).as_deref().is_some_and(|host| {
        host == "googleapis.com" || host.ends_with(".googleapis.com") || host.ends_with(".googleusercontent.com")
    })
}

Comment on lines +421 to +426
fn is_google_api_download_host(uri: &str) -> bool {
matches!(
parse_download_uri_host(uri).as_deref(),
Some("googleapis.com" | "www.googleapis.com")
)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The storage.googleapis.com host should be included in the list of hosts allowed to receive the Bearer authentication token. Many Drive download operations redirect to Google Cloud Storage (GCS). If the GCS URL is not a signed URL (e.g., it's a direct link to a private object), the request will fail with a 401/403 error unless the OAuth2 token is provided.

Note that exact matching on storage.googleapis.com is safe, as it prevents sending the token to virtual-hosted style buckets (e.g., bucket.storage.googleapis.com) which could be attacker-controlled.

Suggested change
fn is_google_api_download_host(uri: &str) -> bool {
matches!(
parse_download_uri_host(uri).as_deref(),
Some("googleapis.com" | "www.googleapis.com")
)
}
fn is_google_api_download_host(uri: &str) -> bool {
matches!(
parse_download_uri_host(uri).as_deref(),
Some("googleapis.com" | "www.googleapis.com" | "storage.googleapis.com")
)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gws drive files download reports success but does not create --output file

2 participants